On Lloyd’s k-means Method∗

نویسندگان

  • Sariel Har-Peled
  • Bardia Sadri
چکیده

We present polynomial upper and lower bounds on the number of iterations performed by Lloyd’s method for k-means clustering. Our upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two centers. Surprisingly, our construction spread is polynomial. This is the first construction showing that the k-means heuristic requires more than a polylogarithmic number of iterations. Furthermore, we present two alternative algorithms, with guaranteed performances, which are simple variants of Lloyd’s method. Results of our experimental studies on these algorithms are also presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hartigan's K-Means Versus Lloyd's K-Means - Is It Time for a Change?

Hartigan’s method for k-means clustering holds several potential advantages compared to the classical and prevalent optimization heuristic known as Lloyd’s algorithm. E.g., it was recently shown that the set of local minima of Hartigan’s algorithm is a subset of those of Lloyd’s method. We develop a closed-form expression that allows to establish Hartigan’s method for k-means clustering with an...

متن کامل

On Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice

We provide new analyses of Lloyd’s algorithm (1982), commonly known as the k-means clustering algorithm. Kumar and Kannan (2010) showed that running k-SVD followed by a constant approximation k-means algorithm, and then Lloyd’s algorithm, will correctly cluster nearly all of the dataset with respect to the optimal clustering, provided the dataset satisfies a deterministic clusterability assumpt...

متن کامل

Accelerating Lloyd’s Algorithm for k-Means Clustering

The k-means clustering algorithm, a staple of data mining and unsupervised learning, is popular because it is simple to implement, fast, easily parallelized, and offers intuitive results. Lloyd’s algorithm is the standard batch, hill-climbing approach for minimizing the k-means optimization criterion. It spends a vast majority of its time computing distances between each of the k cluster center...

متن کامل

Hartigan's Method: k-means Clustering without Voronoi

Hartigan’s method for k-means clustering is the following greedy heuristic: select a point, and optimally reassign it. This paper develops two other formulations of the heuristic, one leading to a number of consistency properties, the other showing that the data partition is always quite separated from the induced Voronoi partition. A characterization of the volume of this separation is provide...

متن کامل

Further heuristics for $k$-means: The merge-and-split heuristic and the $(k, l)$-means

The k-means clustering problem asks to partition the data into k clusters so as to minimize the sum of the squared Euclidean distances of the data points to their closest cluster center. Finding the optimal k-means clustering of a d-dimensional data set is NP-hard in general and many heuristics have been designed for minimizing monotonically the k-means objective function. Those heuristics got ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007